Approach for camera control

ABSTRACT

An approach is provided for a user interface for enabling control of a camera. In one example, a method includes the following: displaying a tone mapped high dynamic range (HDR) image on a user interface device of the camera; receiving user edits via an input device associated with the user interface device; sending the user edits to one or more back-end devices of the camera to perform processing operations based on the user edits; receiving an updated tone mapped HDR image from the one or more back-end devices, wherein the updated tone mapped HDR image is generated from the processing operations performed based on the user edits; and displaying the updated tone mapped HDR image on the user interface as the camera lens continues to capture frames of the scene for the one or more back-end devices to perform operations that iteratively affect the updated tone mapped HDR image.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit of U.S. provisional patent applicationSer. No. 61/745,198, filed Dec. 23, 2012, which is hereby incorporatedherein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to camera capture and, morespecifically, to an approach for camera control.

2. Description of the Related Art

Mobile devices having a digital camera, a display, an adequatecomputational power, and a touch interface are becoming increasinglycommonplace, and increasingly powerful. More photographs are captured bymobile devices now than ever, and many of them are edited directly ondevice and shared directly from that device, without ever even beinguploaded to PCs. This phenomenon is well-reflected in the recent focuson camera control of and image processing on mobile platforms and alsoin the popularity of photography apps on smart phones.

A typical digital camera, whether a feature-rich digital single lensreflex (DSLR) or a point-and-shoot device, relies on a set of knobs andbuttons to control capture parameters. In a standalone photographicpipeline, the user selects a predefined shooting mode that specifies thecamera metering strategy (e.g., daylight, night-mode, spot-mode,panorama, macro-photography, etc.), captures an image while potentiallyadjusting capture parameters with sliders or dials. Then, as an optionalpost-processing step, the user performs edits to correct for undesiredmetering settings (e.g., picture or specific regions areover/under-exposed, add synthetic blur to background to emphasizeforeground, etc.). This approach, resulting from almost a century ofphotography evolution, is effective but poses some difficulties forinexperienced camera users. Point-and-shoot cameras tend to producesub-optimal images due to primitive metering strategies that do notreflect user's intentions, while DSLR are difficult to operate withoutin-depth knowledge of photography. On top of that, a direct port of knoband buttons interface does not fully utilize the potential of thetouch-based interface available on many mobile devices.

Early photographers could not directly see the results as they weretaking photos, but had to imagine the results as a function of variousimaging parameters such as exposure, focus, even choice of film andpaper that were used. Digital cameras with real-time digital displaysthat show a preview image have made photography much easier in thisrespect. Framing the image and choosing the timing of capture is madeeasier and more fun as the camera gives a preview of what the capturedimage will look like. However, when using many computational photographytechniques the user still needs to imagine the result of, for example,combining an exposure burst into a high dynamic range (HDR) image andtone-mapping the HDR image back to low dynamic range (LDR) for display,rather than seeing an approximation of the end result in a digitalviewfinder. Similar limitations apply also to traditional digitalphotography. Many photographers edit their photographs after capture,using tools such as Photoshop or Lightroom. Unfortunately, users mustcapture the shot in the field before knowing the effect such later editsmight have on the result. The capture process thus remains separatedfrom the image editing process, potentially leading to inadequate dataacquisition (e.g., wrong composition or insufficient signal-to-noiseratio (SNR)) or excessive data acquisition (e.g., longer capture time,exacerbated handshake or motion blur, and increased cost of storage ortransmission.)

Accordingly, typical digital cameras provide a digital viewfinder with asomewhat faithful depiction of the final image. If, however, the imageis created from a burst of differently captured images, or non-linearinteractive edits have a significant contribution to the final outcome,the photographer cannot directly see the results, but needs to imaginethe post-processing effects.

Accordingly, what is needed is a camera that enables capturing a scenethat more accurately represents the user's intent at the moment ofactuating the shutter.

SUMMARY OF THE INVENTION

One implementation of the present approach includes a method for a userinterface for enabling control of a camera. In one example, the methodincludes the following: displaying a tone mapped high dynamic range(HDR) image on a user interface device of the camera, wherein the userinterface includes a plurality of pixels defining a display surface, andwherein the tone mapped HDR image includes an interpretation of a sceneat which a camera lens of the camera is pointing; receiving user editsvia an input device associated with the user interface device; sendingthe user edits to one or more back-end devices of the camera to performprocessing operations based on the user edits; receiving an updated tonemapped HDR image from the one or more back-end devices, wherein theupdated tone mapped HDR image is generated from the processingoperations performed based on the user edits; and displaying the updatedtone mapped HDR image on the user interface as the camera lens continuesto capture frames of the scene for the one or more back-end devices toperform operations that iteratively affect the updated tone mapped HDRimage.

The present approach provides at least two advantages over conventionalapproaches. One advantage is that the camera control system enables theuser to make better decisions regarding image composition, since thecamera control system enables the user to be aware of how the user'spending edits will affect the image that is captured at the moment theshutter is actuated. The viewfinder serves as a pre-visualization toolin this regard, providing an enhanced user experience. Another advantageis that the camera control system carries out routines that betterutilize the capture parameters, such as focus, exposure, gain, whitebalance, and so on.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features of the inventioncan be understood in detail, a more particular description of theinvention, briefly summarized above, may be had by reference toembodiments, some of which are illustrated in the appended drawings. Itis to be noted, however, that the appended drawings illustrate onlytypical embodiments of this invention and are therefore not to beconsidered limiting of its scope, for the invention may admit to otherequally effective embodiments.

FIG. 1 is a block diagram illustrating a camera system configured toimplement one or more aspects of the present invention.

FIG. 2 is a block diagram illustrating a parallel processing subsystem,according to one embodiment of the present invention.

FIG. 3 is a block diagram of the camera system including a cameracontrol system, according to one embodiment of the present invention.

FIG. 4A is a conceptual diagram of a camera system during an initialstage of camera control operations, according to one embodiment of thepresent invention.

FIG. 4B is a conceptual diagram of the camera system while a user isperforming real-time editing on the user interface device, according toone embodiment of the present invention.

FIG. 4C is a conceptual diagram of the camera control system duringreal-time editing following operations of FIG. 4B, according to oneembodiment of the present invention.

FIG. 4D is a conceptual diagram of the camera control system duringreal-time editing following operations of FIG. 4C, according to oneembodiment of the present invention.

FIG. 4E is a conceptual diagram of the camera control system duringreal-time editing following operations of FIG. 4D, according to oneembodiment of the present invention.

FIG. 5 is a flowchart of method steps for controlling a camera,according to one embodiment of the present invention.

FIG. 6A is a diagram illustrating a case (a) of an edit-based meteringvia per-pixel analysis, according to one embodiment of the presentinvention.

FIG. 6B is a diagram illustrating a case (b) of an edit-based meteringvia per-pixel analysis, according to one embodiment of the presentinvention.

FIG. 6C is a diagram illustrating an aggregation of per-pixelobjectives, according to one embodiment of the present invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

In the following description, numerous specific details are set forth toprovide a more thorough understanding of the present invention. However,it will be apparent to one of skill in the art that the presentinvention may be practiced without one or more of these specificdetails. In other instances, well-known features have not been describedin order to avoid obscuring the present invention.

Among other things, embodiments of the present invention are directedtowards camera control, including a new class of algorithms fordetermining camera capture parameters (auto-focus, auto-exposure,auto-white-balance, etc.). Existing camera systems rely on sliders,dials and heuristic algorithms to adjust parameters. Such an approach,though functional, is sub-optimal for touch-based user interfaces andsupports only global changes to the viewfinder stream. Embodiments ofthe present invention, on the other hand, enable for spatially localizedmetering and enable the user to compose the look and feel of thephotograph through a set of edits applied directly on the viewfinderimage. The underlying optimization framework ensures that the real-timeexecution of camera processing fulfill both user-defined appearance andimage quality constraints.

Hardware Overview

FIG. 1 is a block diagram illustrating a camera system 100 configured toimplement one or more aspects of the present invention. FIG. 1 in no waylimits or is intended to limit the scope of the present invention.System 100 may be a digital camera, tablet computer, laptop computer,smart phone, mobile phone, mobile device, personal digital assistant,personal computer or any other device suitable for practicing one ormore embodiments of the present invention. A device is hardware or acombination of hardware and software. A component is typically a part ofa device and is hardware or a combination of hardware and software.

Camera system 100 includes a central processing unit (CPU) 102 and asystem memory 104 that includes a device driver 103. CPU 102 and systemmemory 104 communicate via an interconnection path that may include amemory bridge 105. Memory bridge 105, which may be, for example, aNorthbridge chip, is connected via a bus or other communication path 106(e.g., a HyperTransport link, etc.) to an input/output (I/O) bridge 107.I/O bridge 107, which may be, for example, a Southbridge chip, receivesuser input from one or more user input devices 108 (e.g., touch screen,cursor pad, keyboard, mouse, etc.) and forwards the input to CPU 102 viapath 106 and memory bridge 105. A parallel processing subsystem 112 iscoupled to memory bridge 105 via a bus or other communication path 113(e.g., peripheral component interconnect (PCI) express, AcceleratedGraphics Port (AGP), and/or HyperTransport link, etc.). In oneimplementation, parallel processing subsystem 112 is a graphicssubsystem that delivers pixels to a display device 110 (e.g., aconventional cathode ray tube (CRT) and/or liquid crystal display (LCD)based monitor, etc.). A system disk 114 is also connected to I/O bridge107. A switch 116 provides connections between I/O bridge 107 and othercomponents such as a network adapter 118 and various add-in cards 120and 121. Other components (not explicitly shown), including universalserial bus (USB) and/or other port connections, compact disc (CD)drives, digital video disc (DVD) drives, film recording devices, and thelike, may also be connected to I/O bridge 107. Communication pathsinterconnecting the various components in FIG. 1 may be implementedusing any suitable protocols, such as PCI, PCI Express (PCIe), AGP,HyperTransport, and/or any other bus or point-to-point communicationprotocol(s), and connections between different devices that may usedifferent protocols as is known in the art.

As further described below with reference to FIGS. 2-5, parallelprocessing subsystem 112 includes parallel processing units (PPUs)configured to execute a software application (e.g., device driver 103)by using circuitry that enables camera control. Those packet types arespecified by the communication protocol used by communication path 113.In situations where a new packet type is introduced into thecommunication protocol (e.g., due to an enhancement to the communicationprotocol), parallel processing subsystem 112 can be configured togenerate packets based on the new packet type and to exchange data withCPU 102 (or other processing units) across communication path 113 usingthe new packet type.

In one implementation, the parallel processing subsystem 112incorporates circuitry optimized for graphics and video processing,including, for example, video output circuitry, and constitutes agraphics processing unit (GPU). In another implementation, the parallelprocessing subsystem 112 incorporates circuitry optimized for generalpurpose processing, while preserving the underlying computationalarchitecture, described in greater detail herein. In yet anotherimplementation, the parallel processing subsystem 112 may be integratedwith one or more other system elements, such as the memory bridge 105,CPU 102, and I/O bridge 107 to form a system on chip (SoC).

It will be appreciated that the system shown herein is illustrative andthat variations and modifications are possible. The connection topology,including the number and arrangement of bridges, the number of CPUs 102,and the number of parallel processing subsystems 112, may be modified asdesired. For instance, in some implementations, system memory 104 isconnected to CPU 102 directly rather than through a bridge, and otherdevices communicate with system memory 104 via memory bridge 105 and CPU102. In other alternative topologies, parallel processing subsystem 112is connected to I/O bridge 107 or directly to CPU 102, rather than tomemory bridge 105. In still other implementations, I/O bridge 107 andmemory bridge 105 might be integrated into a single chip. Largeimplementations may include two or more CPUs 102 and two or moreparallel processing systems 112. The particular components shown hereinare optional; for instance, any number of add-in cards or peripheraldevices might be supported. In some implementations, switch 116 iseliminated, and network adapter 118 and add-in cards 120, 121 connectdirectly to I/O bridge 107.

FIG. 2 is a block diagram illustrating a parallel processing subsystem112, according to one embodiment of the present invention. As shown,parallel processing subsystem 112 includes one or more parallelprocessing units (PPUs) 202, each of which is coupled to a localparallel processing (PP) memory 204. In general, a parallel processingsubsystem includes a number U of PPUs, where U≧1. (Herein, multipleinstances of like objects are denoted with reference numbers identifyingthe object and parenthetical numbers identifying the instance whereneeded.) PPUs 202 and parallel processing memories 204 may beimplemented using one or more integrated circuit devices, such asprogrammable processors, application specific integrated circuits(ASICs), or memory devices, or in any other technically feasiblefashion.

Referring again to FIG. 1, in some implementations, some or all of PPUs202 in parallel processing subsystem 112 are graphics processors withrendering pipelines that can be configured to perform various tasksrelated to generating pixel data from graphics data supplied by CPU 102and/or system memory 104 via memory bridge 105 and bus 113, interactingwith local parallel processing memory 204 (which can be used as graphicsmemory including, e.g., a conventional frame buffer) to store and updatepixel data, delivering pixel data to display device 110, and the like.In some implementations, parallel processing subsystem 112 may includeone or more PPUs 202 that operate as graphics processors and one or moreother PPUs 202 that are used for general-purpose computations. The PPUsmay be identical or different, and each PPU may have its own dedicatedparallel processing memory device(s) or no dedicated parallel processingmemory device(s). One or more PPUs 202 may output data to display device110 or each PPU 202 may output data to one or more display devices 110.

In operation, CPU 102 is the master processor of camera system 100,controlling and coordinating operations of other system components. Inparticular, CPU 102 issues commands that control the operation of PPUs202. In some implementations, CPU 102 writes a stream of commands foreach PPU 202 to a pushbuffer (not explicitly shown in either FIG. 1 orFIG. 2) that may be located in system memory 104, parallel processingmemory 204, or another storage location accessible to both CPU 102 andPPU 202. PPU 202 reads the command stream from the pushbuffer and thenexecutes commands asynchronously relative to the operation of CPU 102.

Referring back now to FIG. 2, each PPU 202 includes an I/O unit 205 thatcommunicates with the rest of camera system 100 via communication path113, which connects to memory bridge 105 (or, in one alternativeimplementation, directly to CPU 102). The connection of PPU 202 to therest of camera system 100 may also be varied. In some implementations,parallel processing subsystem 112 is implemented as an add-in card thatcan be inserted into an expansion slot of camera system 100. In otherimplementations, a PPU 202 can be integrated on a single chip with a busbridge, such as memory bridge 105 or I/O bridge 107. In still otherimplementations, some or all elements of PPU 202 may be integrated on asingle chip with CPU 102.

In one implementation, communication path 113 is a PCIe link, in whichdedicated lanes are allocated to each PPU 202, as is known in the art.Other communication paths may also be used. As mentioned above, acontraflow interconnect may also be used to implement the communicationpath 113, as well as any other communication path within the camerasystem 100, CPU 102, or PPU 202. An I/O unit 205 generates packets (orother signals) for transmission on communication path 113 and alsoreceives all incoming packets (or other signals) from communication path113, directing the incoming packets to appropriate components of PPU202. For example, commands related to processing tasks may be directedto a host interface 206, while commands related to memory operations(e.g., reading from or writing to parallel processing memory 204) may bedirected to a memory crossbar unit 210. Host interface 206 reads eachpushbuffer and outputs the work specified by the pushbuffer to a frontend 212.

Each PPU 202 advantageously implements a highly parallel processingarchitecture. As shown in detail, PPU 202(0) includes an arithmeticsubsystem 230 that includes a number C of general processing clusters(GPCs) 208, where C≧1. Each GPC 208 is capable of executing a largenumber (e.g., hundreds or thousands) of threads concurrently, where eachthread is an instance of a program. In various applications, differentGPCs 208 may be allocated for processing different types of programs orfor performing different types of computations. The allocation of GPCs208 may vary dependent on the workload arising for each type of programor computation.

GPCs 208 receive processing tasks to be executed via a work distributionunit 200, which receives commands defining processing tasks from frontend unit 212. Front end 212 ensures that GPCs 208 are configured to avalid state before the processing specified by the pushbuffers isinitiated.

When PPU 202 is used for graphics processing, for example, theprocessing workload for operation can be divided into approximatelyequal sized tasks to enable distribution of the operations to multipleGPCs 208. A work distribution unit 200 may be configured to producetasks at a frequency capable of providing tasks to multiple GPCs 208 forprocessing. In one implementation, the work distribution unit 200 canproduce tasks fast enough to simultaneously maintain busy multiple GPCs208. By contrast, in conventional systems, processing is typicallyperformed by a single processing engine, while the other processingengines remain idle, waiting for the single processing engine tocomplete tasks before beginning their processing tasks. In someimplementations of the present invention, portions of GPCs 208 areconfigured to perform different types of processing. For example, afirst portion may be configured to perform vertex shading and topologygeneration. A second portion may be configured to perform tessellationand geometry shading. A third portion may be configured to perform pixelshading in screen space to produce a rendered image. Intermediate dataproduced by GPCs 208 may be stored in buffers to enable the intermediatedata to be transmitted between GPCs 208 for further processing.

Memory interface 214 includes a number D of partition units 215 that areeach directly coupled to a portion of parallel processing memory 204,where D≧1. As shown, the number of partition units 215 generally equalsthe number of DRAM 220. In other implementations, the number ofpartition units 215 may not equal the number of memory devices. Dynamicrandom access memories (DRAMs) 220 may be replaced by other suitablestorage devices and can be of generally conventional design. Rendertargets, such as frame buffers or texture maps may be stored acrossDRAMs 220, enabling partition units 215 to write portions of each rendertarget in parallel to efficiently use the available bandwidth ofparallel processing memory 204.

Any one of GPCs 208 may process data to be written to any of the DRAMs220 within parallel processing memory 204. Crossbar unit 210 isconfigured to route the output of each GPC 208 to the input of anypartition unit 215 or to another GPC 208 for further processing. GPCs208 communicate with memory interface 214 through crossbar unit 210 toread from or write to various external memory devices. In oneimplementation, crossbar unit 210 has a connection to memory interface214 to communicate with I/O unit 205, as well as a connection to localparallel processing memory 204, thereby enabling the processing coreswithin the different GPCs 208 to communicate with system memory 104 orother memory that is not local to PPU 202. In the implementation shownin FIG. 2, crossbar unit 210 is directly connected with I/O unit 205.Crossbar unit 210 may use virtual channels to separate traffic streamsbetween the GPCs 208 and partition units 215.

Again, GPCs 208 can be programmed to execute processing tasks relatingto a wide variety of applications, including but not limited to, linearand nonlinear data transforms, filtering of video and/or audio data,modeling operations (e.g., applying laws of physics to determineposition, velocity and other attributes of objects), image renderingoperations (e.g., tessellation shader, vertex shader, geometry shader,and/or pixel shader programs), and so on. PPUs 202 may transfer datafrom system memory 104 and/or local parallel processing memories 204into internal (on-chip) memory, process the data, and write result databack to system memory 104 and/or local parallel processing memories 204,where such data can be accessed by other system components, includingCPU 102 or another parallel processing subsystem 112.

A PPU 202 may be provided with any amount of local parallel processingmemory 204, including no local memory, and may use local memory andsystem memory in any combination. For instance, a PPU 202 can be agraphics processor in a unified memory architecture (UMA)implementation. In such implementations, little or no dedicated graphics(parallel processing) memory would be provided, and PPU 202 would usesystem memory exclusively or almost exclusively. In UMA implementations,a PPU 202 may be integrated into a bridge chip or processor chip orprovided as a discrete chip with a high-speed link (e.g., PCIe)connecting the PPU 202 to system memory via a bridge chip or othercommunication means.

As noted above, any number of PPUs 202 can be included in a parallelprocessing subsystem 112. For instance, multiple PPUs 202 can beprovided on a single add-in card, or multiple add-in cards can beconnected to communication path 113, or one or more of PPUs 202 can beintegrated into a bridge chip. PPUs 202 in a multi-PPU system may beidentical to or different from one another. For instance, different PPUs202 might have different numbers of processing cores, different amountsof local parallel processing memory, and so on. Where multiple PPUs 202are present, those PPUs may be operated in parallel to process data at ahigher throughput than is possible with a single PPU 202. Systemsincorporating one or more PPUs 202 may be implemented in a variety ofconfigurations and form factors, including desktop, laptop, or handheldpersonal computers, servers, workstations, game consoles, embeddedsystems, and the like.

One embodiment of the invention may be implemented as a program productfor use on a computer system, such as the camera system 100 of FIG. 1for example. One or more programs of the program product definefunctions of the embodiments (including the methods described herein)and can be contained on a variety of computer-readable storage media.Illustrative computer-readable storage media include, but are notlimited to: (i) non-writable storage media (e.g., read-only memorydevices within a computer such as CD-ROM disks readable by a CD-ROMdrive, flash memory, ROM chips or any type of solid-state non-volatilesemiconductor memory) on which information is permanently stored; and(ii) writable storage media (e.g., floppy disks within a diskette driveor hard-disk drive or any type of solid-state random-accesssemiconductor memory) on which alterable information is stored.

Overview of a Camera Control System

The current approach harnesses the processing power of mobile devices toenable desktop-like workflow on digital cameras and on mobile devices(e.g., cell phones) having digital cameras. The camera control systemuses these processing capabilities to introduce the notion of real-timeviewfinder editing, which enables the user to perform edits directly onthe viewfinder prior to capture. The camera control system brings theWYSIWYG interface to both computational photography applications and forenabling the user to see directly the effects of interactive edits onthe viewfinder. Using this interface, the camera control system alsogathers information on which aspects of the image are important to theuser, which again affects capture parameter selection such as the numberof images to capture, the values for exposure, focus, white-balance, andso forth. To realize this philosophy, the camera control system uses aunified framework in which the user provides input (e.g., sparse,stroke-based input, etc.) to control local or global tone, color,saturation, and focus, among other parameters. The user then receivesimmediate feedback on the viewfinder.

The selections the user provides are affinity-based. The camera controlsystem stores the selections as a sparsely sampled function over theimage-patch space. The camera control system then propagates selectionsto subsequent viewfinder frames by matching image patches. The cameracontrol system applies the edits both to the viewfinder image and to thehigh-resolution image(s) the user eventually decides to capture (e.g.,by actuating the shutter). Also, the camera control system internallyuses the edits to drive the camera control routines that determine theappropriate exposure and/or focus value(s), etc. The user can evenprovide inconsistent cues, which can then be satisfied by taking twoimages with different settings and combining the result.

The control system enables the user to make better decisions regardingcomposition because the camera control system makes the user aware ofhow the user's pending edits will affect the captured photograph. Thecamera control system serves as a pre-visualization tool in this regard,providing a novel user experience. Also, the system enables the cameracontrol routines to better optimize the capture parameters, such asfocus, exposure, gain, and white balance. The user is expressing how thewill transform the image, enabling the algorithms to deduce the noisethreshold, depth of field, and dynamic range necessary to support thetransform. For instance, if the user wishes to locally brighten a darkregion of the scene, the user's input ought to lead to a differentmetering decision; or, if the user is happy to let the sky saturate onthe display, a full exposure stack should not be necessary. In thisframework, stack-based computational photography merges seamlessly withtraditional photography, kicking in when and substantially only whennecessary.

The camera control system provides a fast edit propagation algorithm, aviewfinder interface that visualizes the edits, tone-mapping, andmulti-exposure blending, and camera control routines to take advantageof the knowledge of the visualization, which altogether form a systemthat can run at interactive rates on a mobile platform and/or a desktopplatform, among other platforms.

FIG. 3 is a block diagram of the camera system 101 including a cameracontrol system 302, according to one embodiment of the presentinvention. As shown, the camera control system 302 includes, withoutlimitation, a user interface device 306, a mask generator device 304,and a metering device 302, which are couple to each other. The userinterface device 306 includes without limitation a real-time editordevice 322 and a what-you-see-is-what-you-get (WYSIWYG) viewfinderdevice 308.

The user interface device 306 is a front-end device that uses a fastspatio-temporal edit propagation framework to enable stroke-basedediting of the viewfinder at an interactive rate. The camera controlsystem 302 models edits as a function over an image-patch space andstores the edits in a high-dimensional data structure.

The metering device 305 is a back-end device that uses HDR metering tooptimize the quality of the edited viewfinder image that is displayed onthe screen. For a given user edits and a scene (e.g., represented by atone mapped HDR image constructed in real-time), the metering device 305generates the metering parameters (e.g., exposure time, etc.) thatmaximize the HDR image's appearance quality for the display. Inparticular, the metering device 305 factors in a perceptually motivatedthreshold for each displayed pixel, maps the threshold backward throughthe image processing pipeline (including user edits and tone-mapping) inorder to compute the acceptable noise threshold at the image sensor, andthen calculates the set of exposures that would satisfy the computedthresholds for the entire image. This scheme is in contrast to existingHDR metering algorithms that aim to acquire the physical scene radianceas faithfully as possible.

The mask-generator device 402 is a back-end device that generates anedits mask by classifying texture of a scene, generating a sparse editsmask, performing edge-preserving smoothing, and then constructing theedits mask. The actions are explained further below with reference toFIG. 5.

The camera control system 302 may be carried out on a dedicated digitalcamera, a desktop computer, a laptop computer, tablet computer and/or amobile phone, among other platforms. The camera control system 302 isfurther described below with reference to FIGS. 4A-5.

Example Viewfinder Editing

FIGS. 4A-4E are a sequence of diagrams that illustrate editing on aviewfinder device 308 of a user interface device 306, according variousembodiments of the present invention. The camera control systemprocesses viewfinder edits to display more accurately an image that theuser intends to capture. The camera continuously captures frames of thescene at which the camera lens is pointing and continuously uses theframes for processing in back-end operations. The back-end operationsinvolve operations that combine the viewfinder edits and the capturedframes of the camera. Accordingly, the camera control of the approachinvolves operations that are ongoing, iterative, and highly dependent onone another. Thus, the viewfinder editing occurs in real-time (e.g.,while the camera captures frames of the scene for back-end processing).

FIG. 4A is a conceptual diagram of a camera system 302 during an initialstage of camera control operations, according to one embodiment of thepresent invention. The camera control system 302 is powered on andincludes a user interface device 306 having a WYSISYG viewfinder device308. A user 410 is pointing the camera lens (not shown) toward a livescene 404. The live scene 404 is “live” because the scene includesobjects (e.g., people, landscape, animals, and/or any other object,etc.) that are potentially moving, and also because the camera ispotentially moving at least a little. However, alternatively, an objectin the live scene 404 may be substantially still and unmoving relativeto the position camera.

At this initial stage, the camera control system 302 depicts the livescene 404 as an unedited image 412 on the WYSIWYG viewfinder device 308of a user interface device 306. In this example, the camera controlsystem 302 is illustrated as being part of a camera of a tabletcomputer. Other examples, besides a tablet computer, include a smartphone, a dedicated digital camera, a laptop computer, a mobile phone, amobile device, a personal digital assistant, a personal computer or anyother device suitable for practicing one or more embodiments of thepresent invention.

FIG. 4B is a conceptual diagram of the camera system 302 while a user410 is performing real-time editing on the user interface device 306,according to one embodiment of the present invention. In oneimplementation, the camera control system 302 can receive sparse strokes(not shown) from the user 410 on the user interface device 306 as if theuser interface device 306 is a painting canvas. As the camera controlsystem 302 receives strokes on the user interface device 306, the cameracontrol system 302 marks corresponding image patches of a selectionregion 420. The camera control system 302 can receive a confirmation ofthe selection region 420 by receiving a tap (or mouse click, etc.)within the region of the selection region 420. Alternatively, the cameracontrol system 302 can receive a cancellation of the selection region420 by receiving, for example, a tap (or mouse click, etc.) outside ofthe region of the selection region 420.

The selection region 420 includes a portion and/or all of the pixels ofthe user interface device 308. In this example, the selection region 420is a rectangle. Alternatively, the selection region may include anothershape, such as, for example, a circle, an oval, any type of polygon,among other shapes. The selection region 420 may also, or alternatively,be based on texture. For example, a sky may have a different texturethan a person's face. Accordingly, in one implementation, the cameracontrol system 302 can identify differences in texture and acquire theselection region based on texture matching (e.g., matching a sky'stexture to select the sky as the selection region, or matching aperson's texture to select the person as the selection region, etc.)

The camera control system 302 stores image patches of the selectionregion 420 in a data structure that supports matching image patches. Insubsequent frames of the live scene 404 (which is fluid and not still)the camera control system 302 selects pixels having corresponding imagepatches that match the previously selected patches. As no tracking isinvolved, the camera control system 302 is robust against motion and/orocclusions of the scene and/or the camera lens.

Confirming the selection region 420, the real-time editor device 322displays various edit options. In this example, the edit options includebrightness, saturation, contrast, and hue. Other examples of editoptions (not shown) may include without limitation white balance, color,tone, focus, exposure, gain, and grey scale. The camera control system302 is configured to receive a selection of one of the edit options fromthe user 410.

FIG. 4C is a conceptual diagram of the camera control system 302 duringreal-time editing following operations of FIG. 4B, according to oneembodiment of the present invention. In this example, the selectionregion 421 is shown as including the baby's face. The camera controlsystem 302 is edge-preserving (e.g., edge-aware) and is able todistinguish between texture of the object (e.g., baby's face or baby'sskin) and texture of another part of the live scene 404. The userinterface device 306 has received from the user 410 a selection of thesaturation edit option on the real-time editor device 323. The real-timeeditor device 323 displays on the user interface 306 a slider thatenables the camera control system 302 to receive from the user 410 aselection of saturation magnitude. In this example, the real-time editordevice 323 updates accordingly as the camera control system 302 receivesa stroke gesture from the user 410 to indicate a saturation magnitude.In another example, the real-time editor device 323 may be configured toreceive a stroke gesture on the selection region 321 without a visibleslider.

FIG. 4D is a conceptual diagram of the camera control system 302 duringreal-time editing following operations of FIG. 4C, according to oneembodiment of the present invention. The region of selection region 421(FIG. 4C) that the camera control system 302 is receiving editsregarding saturation becomes noisy, as shown in FIG. 4D, as the cameracontrol system performs processing on the selection. The noisy region424 can by a useful feature of the camera control system 302. Forexample, by displaying an edited image 432, the camera control system302 can notify the user that processing is occurring for the requesteduser edits. This step of displaying noise is an alternative step and mayor may not be a feature of the camera control system 302, depending onthe embodiment. For example, processing of the camera control system 302may be sufficiently fast such that the user 410 does not need to benotified by displaying noisy output.

FIG. 4E is a conceptual diagram of the camera control system 302 duringreal-time editing following operations of FIG. 4D, according to oneembodiment of the present invention. After selections and edits of theuser 410, an edited and adjusted image 442 is shown in FIG. 4E. Thecamera control system 302 carried out processing on the selection region420, as described above with reference to FIGS. 4B-4D, where saturationwas edited by using real-time edit-aware metering. The displayed image442 on the viewfinder device 308 is substantially the same image thatmay be captured as the user 410 inputs a request to actuate the shutter.The displayed image 442 is the result of an application of transformsthat otherwise may be applied during post processing steps. However, inthe present technology, the camera control system 302 has appliedtransforms during real-time usage of the camera with respect to the livescene 404, instead of during post processing operations. Such real-timeprocessing makes for a captured image 442 that more accurately depictsthe user's intent.

Accordingly, the camera control system 302 enables the user to specifywhat is important by altering the local appearance, including withoutlimitation brightness, saturation, contrast, hue, tone, color,saturation, and/or focus via stroke-based input. The camera controlsystem 302 delivers visualization of these modifications to the user 410at an interactive rate and drives the camera control routines to selectbetter capture parameters (e.g., exposure, gain, focus, white balance,etc.), acquiring image bursts as necessary. Such processing requiresreal-time tracking of multiple regions of interest, real-timevisualization of edits on the viewfinder, and determination of theoptimal burst capture parameters given the edits. As further describedbelow with reference to FIGS. 4 and 5, the camera control system 302provides solutions for each of these problems, enabling interactiveviewfinder editing on both desktop and mobile platforms. The viewfinderediting of the camera control system 302 improves the selection ofcapture parameters and provides a more engaging photography experiencefor the user 410.

Additional Architectural Detail

Referring again to FIG. 3, for image acquisition, the metering device305 streams raw image data from the sensor (not shown) into a data stackthat caches the most recent frames, which are to be merged and processedby a processing thread. The camera control system 302 internallyacquires a full exposure or focus stack for display on the view finderof the user interface device 306. Otherwise, a clipped, blurry orunderexposed region may interfere with the user's selection later.Hence, the capture parameters are updated on a per-frame basis asfollows: the camera control system 302 computes the desired exposure forthe k-th frame by taking the histogram of the log-luminance channel ofcurrent HDR scene estimate, removes bins that are expected to be coveredby frames up to the (k−1)-th frame, and meters the remaining bins. Forfocal stacks, the camera control system 302 iterates from the minimum tothe maximum focus distance in fixed increments.

For edit propagation on the viewfinder device 308, the processing threadof the metering device 305 fetches the N most recent frames (e.g., N=3for exposure stacks, and N=4 for focal stacks) and merges the framesinto an HDR radiance map or an all-focus image. The camera controlsystem 302 can use any formula known to those skilled in the art tomerge LDR images into an HDR image. In one implementation, the cameracontrol system 302 can store the resulting scene estimate in a Logluvformat (e.g., LogLuv TIFF). Using a format based on a canonical unit isdesirable because selections and edits that are based on image patchesshould be robust against changes in capture parameters.

The mask generator device 304 models selection and edits as functionsover the space of an image patch descriptor. The mask generator device304 computes these functions over each image patch in the scene andgenerates masks, as further described below with reference to FIG. 5.The metering device 305 then applies the masks onto the encoded data(e.g., Logluv encoded data), tone-maps the resulting output, anddisplays the output on the viewfinder device 308. Logluv is an encodingused for storing HDR imaging data inside another image format (e.g.,TIFF). In case the user interface device 306 receives focus edits, themetering device 305 can recompose the images from the focal stack.

Based on the displayed result, the metering device 305 re-computes theoptimal set of exposure and/or focus values, as further described belowwith reference to FIG. 5. The metering device 305 uses these parametersto affect frames that the camera captures. The metering device 305passes the captured frame(s) through the aforementioned processingpipeline and generates one HDR image. The metering device 305 performstone mapping on the HDR image to generate a final output for display onthe viewfinder device 308.

The user interface device 306 presents the user with a seemingly normalviewfinder device 308. However, internally, the camera control system302 is regularly acquiring an exposure and/or focal stack at a back-end.In one embodiment, as described above with reference to FIG. 4B, theuser selects a region via stroke gestures. Then, the user may cancel theselection by tapping outside the selected region, or confirm theselection by tapping within, which triggers an overlay with iconsrepresenting various types of edits (e.g., brightness, saturation,contrast, hue, etc.). Once the user chooses the type of edit to apply tothe selected region, the user makes a swiping gesture horizontally leftor right to shift the designated trait (e.g., darker or brighter,sharper or blurrier, etc.).

Method Overview

FIG. 5 is a flowchart of method steps for controlling a camera,according to one embodiment of the present invention. In someimplementations, the method steps may be carried out by the cameracontrol system 302 of FIG. 3, which includes a user interface device306, a mask generator device 304, and a metering device 305. As oneskilled in the art will recognize, the method steps are fluid,iterative, and highly dependent on one another. For explanatory purposesonly, the description of the method steps below arbitrarily starts atthe user interface device 306. However, in other examples, thedescription of the method steps could begin at the metering device 305or somewhere else in the architecture. Regardless, although the methodsteps are described herein in conjunction with the systems of FIGS. 1-3,one skilled in the art will understand that any system configured toimplement the method steps, in any order, falls within the scope of thepresent invention.

As shown, a method 500 begins in an action 502, the user interfacedevice 306 displays an image on a viewfinder (e.g., WYSIWYG viewfinder308 of FIG. 3). For example, the camera is powered on; the camera lensis pointed at and is receiving ongoing input from a scene. Accordingly,the image includes, at least in part, the camera's interpretation of ascene at which the camera lens is pointing. Meanwhile, the viewfinderdevice receives real-time input from the metering device 305 in anaction 520, which is further described below. Accordingly, the imagedisplayed on the user interface device 306 is a real-time combination offrames captured by the camera lens and from processing performed on theframes with respect to user edits. As further described below in othersteps, the combination of input is real-time and continuously changingas the scene changes and/or the user edits change.

In a decision operation 504, the user interface device 504 determines ifa request to actuate the shutter is being received. If yes, in an action506, the user interface device 306 sends a request to the appropriatecomponent to actuate the shutter. The camera control system 302 can thenactuate the shutter and capture a final image that is displayed on theWYSIWYG viewfinder of the user interface device 504.

However, in decision operation 504, if the user interface device 306determines there is no request to actuate the shutter, the method 500moves to a decision operation 508 where the user interface device 306determines if user edits are being received. If no, then the method 500is at an end and may return to the start to continue.

However, in decision operation 508, if the user interface device 306determines that user edits are being received, then the method 500 movesto an action 510 where the user interface device 306 receives the useredits. For example, the user communicates the user's intention byselecting a region and/or all of the viewfinder and performing edits(e.g., brightening or darkening, shifting color, altering saturation andcontrast, etc.) via scribble gestures.

In an action 511, the user interface device 306 sends the one or moreuser edits to the mask generator device 304.

In an action 512, the mask generator device 304 classifies image patchesof the selection region with respect to the one or more user edits. Forexample, the mask generator device 304 begins the process oftransforming the selection region into a data format (e.g., imagepatches) that the camera control system can match over multiple frames.Meanwhile, the mask generator device 304 receives feedback from themetering device 305 during an action 526, which is further discussedbelow.

In an action 514, the mask generator device 304 specifies a sparse editsmask. For example, the mask generator device 304 enables image patchesto be matched in subsequent viewfinder frames, so that the gesturesremain persistent. Matching image patches over multiple viewfinderframes can be achieved by matching image patches that look alike (e.g.,matching each 8 pixel by 8 pixel texture patch or any size texturepatch). Each patch includes a subset of pixels of the selection region.Whenever the user scribbles over a patch to select the patch or to applyedits, the camera control system updates the displayed image to reflectthe change. Propagation is achieved by matching the patches in eachviewfinder frame in order to infer selection and the application ofedits.

In an action 516, the mask generator device 304 performs edge-awareup-sampling on the edits mask. Edge-aware smoothing is described abovein the section related to specifying edits.

In an action 518, the mask generator device 518 generates an edits maskfor use by the metering device 305. The edits mask takes into accountthe

In an action 520, the metering device 305 performs a tone mappingoperation 520. Meanwhile, in an action 522, the metering device 305performs metering operations. Based on the edits mask and the tonemapped HDR image, the metering device 305 generates the meteringparameters (e.g., exposure time, etc.) that maximize the HDR image'sappearance quality for the display. For example, the metering device 305quantifies metering requirements based on an edits mask and a tonemapped HDR image, meters the tone mapped HDR image to calculate meteringparameters, and provides to the camera the metering parameters thataffect frame capturing operations and thereby maximize the appearance ofthe captured HDR image for the display (e.g., displaying thepost-edits).

For instance, the metering device 305 is configured to react to useredits, for example, by introducing metered output including a bracket(e.g., exposure brackets and/or focus brackets, etc.) in order tocomport with the constraints implied by the user edits. The meteringdevice 305 analyzes the tone mapped HDR image and edits mask todetermine the metering parameters. The metering device 305 then selectsthe capture parameters for the next viewfinder frame to be requested.Depending on the types and extent of edits, certain luminance and/ordepth ranges may become more or less important to capture, relaxingand/or tightening the constraints on the algorithms of the adjust device305. The metering device 305 feeds the metering for display back intothe metering device 305 to affect captured frames.

In an action 524, the metering device 305 captures frames of the livescene. In this example of FIG. 5, the metering device 305 capturesframes of a sliding window 530 having a predetermined number of frames.The sliding window 530 includes sequence of three frames at a time. Forexample, at a time t_(k-1), the sliding window includes a sequence ofadjacent frames, including frame N-5, frame N-4, and frame N-3. At atime t_(k), the sliding window includes a sequence of adjacent frames,including frame N-2 frame N-1, and frame N, and so on. The size of thesliding window may be affected by the user edits. For example, the editsmasks generated at the action 518 can affect the size of the slidingwindow. For instance, if the camera control system 302 receives an inputfrom the user to brighten a very dimly lit selection of the scene, thenthe edits may require an additional shot (e.g., additional frame in thesliding window 530) with longer exposure and/or higher gain in order torecover data in that selection region. Such user edits affect the levelof acceptable signal-to-noise (SNR), which is directly linked to theoptimal choice of exposure and/or gain.

In an action 526, the metering device 305 generates one HDR image. Forexample, the metering device 305 blends, aligns, and/or merges theframes of the sliding window 530 to generate one estimate of a highdynamic range (HDR) image. In another example (not shown), the camerasystem includes a camera sensor that directly captures HDR frames. Insuch a case, the metering device 305 does not need to blend multipleframes as shown in FIG. 5. Generally, the metering device 305 needs togenerate one HDR image. The particular manner in which the meteringdevice 305 generates the HDR image is typically not so important.

Referring again to the action 520, the metering device 305 combines theedits mask and the HDR image to perform the tone mapping operation. Themetering device 305 then provides an edited and adjusted image fordisplay on the viewfinder device of the user interface device 306. Forexample, the camera control system writes to textures asynchronously theuser edits of the user interface device 306, the selection mask of themask generator device 304, and the scene radiance estimates of themetering device 305.

Returning to the action 502, the user interface device 306 displays theedited and adjusted image received from the metering device 305. Forexample, the frontend application (e.g., the Android user interface)composes the final image and then renders the final for display. Asdescribed above, the actions of the method 500 are iterative.Accordingly, the camera control system regularly updates the displayedimage as the camera control system receives user edits, receives scenechanges, and/or receives camera movement, among other input.

The method 500 may include other actions and/or details that are notdiscussed in this method overview. For example, the method 500 isapplicable to video as well as still images. A difference with video isthat actuation of the shutter causes the camera to capture a sequence ofimages over a period of time, as opposed to one still image at onemoment in time. For video, accurate control over capture parameterswithout causing undesirable temporal artifacts is even more challenging(e.g., too drastic change of the exposure settings is perceived byhumans as flickering). As another example, at least some method stepsmay be carried out by using the parallel processing subsystem 112 ofFIGS. 1 and 2; it may be desirable for the camera control system toprocess some method steps in parallel for the sake of speed and forhaving an optimal user experience. Other actions and/or detailsdescribed herein may be a part of the method 500, depending on theimplementation. Persons skilled in the art will understand that anysystem configured to implement the method steps, in any order, fallswithin the scope of the present invention.

Additional Viewfinder Editing Detail

Image editing on the viewfinder device 308 must accommodate temporallypersistent selection of objects through sparse user input. At the sametime, the camera control system 302 processes each viewfinder frameindependently without relying on preprocessing or training an expensiveclassifier.

Borrowing from conventional work on affinity-based edit propagation onimage sequences, the camera control system 302 models edits and theselection as functions residing in a space of local patch descriptors:

S _(i):

^(n)→[−1,1],  (1)

where n (e.g., 8) is the dimensionality of the patch descriptor, andeach of S₁, S₂, . . . corresponds to a particular type of edit, such astone, color, saturation, blurriness. The value of 0 corresponds to noediting. The camera control system 302 reserves S₀:

^(n)→[0,1] as a soft selection mask.

In one example, the camera control system 302 can use an 8-dimensionaldescriptor (e.g., n=8) based on an 8 pixel×8 pixel image patch, composedof the mean and the first-order and the second-order derivatives of thelog-luminance channel, plus the mean CIELUV chrominance. (CIELUV is acolor space adopted by the International Commission on Illumination(CIE) in 1976.) To decide which features to use, the camera controlsystem 302 performs a principal component analysis (PCA) on a set ofgeneric image patches. The strongest PCA component has been found to besimilar to the patch mean, while the next components could be reasonablyapproximated by the derivatives of the log-luminance, and so forth. Notethat in a departure from previous work, the camera control system 302drops the (x, y) coordinate from the descriptor, in order to be robustagainst scene and camera motion.

Existing methods attempt to globally optimize or interpolate {rightarrow over (S)}_(i) based on the user-provided samples, and as a result,the cost for estimating {right arrow over (S)}_(i) scales linearly withthe number and extent of edits the user has already performed. Instead,the camera control system 302 stores {right arrow over (S)}_(i) in asparse data structure, and treat this issue as a lookup problem. Thus,incrementally updating {right arrow over (S)}_(i) has an O(1) cost.

Because the camera control system 302 forgoes an explicit optimizationor interpolation action, edits may not propagate as aggressively as withother methods. However, this issue is mitigated in two ways: first, thecamera control system 302 applies edge-aware smoothing on S_(i) withrespect to the scene image whenever a viewfinder frame is produced.Second, because the camera control system sends feedback interactivelyas the camera control system receives the user's strokes, controllingpropagation is easy and intuitive for the user. For example, the userpaints (e.g., provides strokes for) S_(i) interactively.

Viewfinder Editing Representing Edits

For storing {right arrow over (S)}_(i), the camera control system 302adapts the commonly known permutohedral lattice, which tileshigh-dimensional space with simplices and stores samples in thevertices. The camera control system 302 uses the lattice to performbarycentric interpolation for insertion and lookup, which incidentallyserves to locally propagate the stored data. While the lattice can beused to support high-dimensional filtering algorithms, the cameracontrol system 302 can use the lattice to house high-dimensionalfunctions quite effectively.

Instead of initializing the lattice with all patches present in a givenimage, as one could do with high-dimensional filtering, the cameracontrol system 302 takes a streaming approach: as the user strokes overthe screen and selects patches (e.g., selection region 420), the cameracontrol system 302 locates only those vertices corresponding to thesepatches and updates their values. Note that unselected patches are neverwritten into the lattice. If a patch lookup fails at any point, adefault value is assumed for {right arrow over (S)}_(i).

To further support streaming edits, the camera control system 302 canaugment the lattice with a decaying scheme similar to the one used in acommonly available video filtering algorithm. The camera control system302 associates with each vertex a perceptual importance measure, whichthe camera control system 302 increases every time the camera controlsystem 302 accesses the vertex, and decays exponentially over time.Hence, an image patch that remains in the viewfinder will have highimportance, whereas a patch that goes out of view for a long time willhave low importance. The camera control system 302 keeps track of thetime each vertex is last updated. Whenever a vertex is accessed for aread or a write, the camera control system 302 decays the vertex'simportance appropriately. When the lattice is at capacity and a newvertex must be inserted, the camera control system 302 examines thosenodes with hash collision and evicts the vertices with the lowestimportance.

Viewfinder Editing Specifying Edits

The camera control system 302 is configured to specify edits, asdescribed above with reference to FIGS. 4A-4E. The camera control system302 receives strokes over the region of interest and receives aconfirmation of the selection (e.g., a tap and/or a mouse click on theselected area). Then, the camera control system 302 presents the userwith a widget (e.g., real-time editor device 322) listing the varioustypes of edits supported, and the camera control system 302 receives aselection (e.g., a tap) of the user's choice. Next, the camera controlsystem 302 receives input (e.g., user horizontally swipes left or right)that specifies a magnitude and a direction of the edit. All of theactions for specifying edits are interactive. For example, as the usermoves his finger (or style, mouse pointer, etc.) on the screen, thecamera control system 302 performs back-end processing and updated editsare depicted on the viewfinder.

For patch selection (e.g., selection region 420), while the user strokeis being registered, the camera control system 302 converts imagepatches having centers that are within a small fixed distance from theevent origin into descriptors and looks up the descriptors from thelattice (e.g., permutohedral lattice described above). If thecorresponding nodes do not exist, the camera control system 302generates and initializes the corresponding nodes. The camera controlsystem 302 increments the value of S₀ for these nodes. As such, the costof applying selection is O(1), independent of viewfinder dimensions andthe edit history.

For visualization on the viewfinder device 308, for each viewfinderframe, the camera control system 302 converts the image patchescontained within the frame into descriptors, and looks up eachdescriptor's associated edits on the lattice. If the user is in thethird phase and is currently applying an edit of type j with extent k,then for each descriptor patch {right arrow over (p)}, the cameracontrol system 302 adjusts S_(j)({right arrow over (p)}) as follows:

S _(j)({right arrow over (p)}):=S _(j)({right arrow over (p)})+k·S₀({right arrow over (p)}).  (2)

This adjustment occurs dynamically and is not written into the lattice.The camera control system then applies S_(j)({right arrow over (p)}).Thus, the cost of visualizing each viewfinder frame grows linearly withthe viewfinder dimensions, and is independent of the content in thelattice.

For finalizing an edit, once the camera control system 302 receives amagnitude and a direction of an edit option, the camera control system302 folds the current edit operation into the lattice by applyingEquation 2 to every patch {right arrow over (p)}, in the lattice, andresets S₀({right arrow over (p)}) to zero. Hence, the cost of finalizingan edit is proportional to the size of the corresponding lattice, andindependent of the viewfinder dimensions.

For edge-aware smoothing, in one implementation of visualization, thecamera control system 302 processes a sub-sampled set of image patches.This limitation saves a significant amount of time but yields user editmasks at a resolution lower than that of the viewfinder. The cameracontrol system 302 can apply edge-aware up-sampling to an intermediateedit mask by using a domain transform (e.g., time domain transformand/or frequency domain transform), with respect to the edges of theviewfinder content (edges of objects in frames captured by the cameralens). Not only does this operation enable the camera control system 302to speed up processes, the operation also enables the camera controlsystem 302 to generate higher-quality masks with improved spatial editpropagation. For example, the edge-aware up-sampling operation providesa data format to a processor that causes the processor to operate at ahigher rate than a processor operating without the edge-awareup-sampling. As another example, the edge aware up-sampling provides adata format to a processor that causes the edits mask to have a higherdegree of spatial edit propagation. When the user orders a full-framecapture, the camera control system 302 generates a full resolution editmasks from the lattice but still applies edge-aware smoothing withdomain transform.

Edit-Based Camera Control

Described above are an interface and underlying algorithms forperforming edits directly on the viewfinder of the camera control system302 and propagating edits forward in time. While those features benefitsthe user by allowing the user to rethink composition based on the edits,those features also provide the camera control system 302 withadditional information that the camera control system 302 can use tobetter select capture metering parameters. Described below are cameracontrol techniques for calculating two types of parameters: (1) exposureand (2) focus for each pixel based on the pixel's intended appearance.These two types of parameters are provided for explanatory purposes. Asdescribed above, other parameters include brightness, saturation,contrast, hue, white balance, color, tone, and gain, among other typesof parameters. Also described below is a methodology for aggregating theresults of back-end operations for pixels in the viewfinder to generatea set of metering and/or focusing parameters for the scene.

Edit-Based Camera Control HDR Metering for Display

Conventional HDR metering algorithms operate to faithfully acquire sceneluminance, attempting to maximize the signal-to-noise ratio (SNR). Thisphilosophy makes sense when post-processing to be performed on theluminance data is unknown, and there is no additional information on theimportance of different scene elements.

In contrast, in the present technology, the camera control system 302can leverage the fact that the entire post-processing pipeline,including tone mapping, is known. The user sees on the viewfinder a tonemapped HDR image, and if some areas are too dark, can initiate processesto brighten the areas. A request to brighten indicates longer exposuresare needed. A request to darken saturated areas to increase contrastindicates shorter exposures are needed. The viewfinder image reflectsthe edits, and once the user is satisfied with the result, the user canrequest actuation of the shutter and thereby cause the camera to capturethe high-resolution HDR image.

The camera control system 302 can quantifying per-pixel exposurerequirements. The L takes into account image fidelity at each pixel, andderives the exposure necessary to meet a particular threshold. Let L bethe physical scene luminance estimated by the camera, perhaps frommultiple exposures; and let I be the k-bit tone mapped result under aglobal, strictly monotonic tone mapping operator T. The user's editscreate an edits map E which, in a spatially varying manner, modulatesthe luminance estimate L. In one implementation, the camera controlsystem 302 sets E(x, y)=2^(6S) ^(i) ^(({right arrow over (p)}) ^(x, y)⁾, corresponding to an adjustment up to +/−6 stops. The camera controlsystem 302 finally clamps the result into k-bits:

I(x,y)=min(2^(k)−1,T(L(x,y)·E(x,y))).  (3)

For each of the 2^(k) display levels, camera control system 302associates a threshold for acceptable visual distortion, modeled as aGaussian noise:

σ_(d):{0, . . . ,2^(k)−1}

⁺.

In other words, the pixel value I(x, y) on display should have astandard deviation σ_(d)(I(x, y)) or less. This threshold depends on theviewing condition, display resolution, and the user's visual adaptation,but for a bright display (e.g., photopic vision), σ_(d) is approximatelyconstant for a given intensity level.

Then, assuming non-saturation, the camera control system 302 should usethe metering algorithm to attempt to record each pixel's physicalluminance L(x, y) so that each pixel's associated uncertainty σ_(w),when carried through the imaging and tone mapping processes, has astandard deviation no larger than σ_(d)(I(x, y)). For sufficiently smalluncertainty, the camera control system 302 can apply first orderapproximation to the tone mapping process to obtain,

$\begin{matrix}{{\frac{\sigma_{w}\left( {x,y} \right)}{\Delta \; {L\left( {x,y} \right)}} \approx {\frac{\sigma_{d}\left( {I\left( {x,y} \right)} \right)}{\Delta \; {I\left( {x,y} \right)}}{\sigma_{w}\left( {x,y} \right)}} \approx \frac{\sigma_{d}\left( {I\left( {x,y} \right)} \right)}{{E\left( {x,y} \right)} \cdot {T^{\prime}\left( {{L\left( {x,y} \right)} \cdot {E\left( {x,y} \right)}} \right)}}},} & (4)\end{matrix}$

via the chain rule, where T′(•) is the derivative of T(•) with respectto L.

Finally, the camera control system 302 assumes a c-bit linear camerathat captures the scene and records raw pixel values:

p(x,y)=min(2^(c)−1,L(x,y)·t·K+N(0;σ_(r))),  (5)

where t is the exposure time; K is a calibration constant; N(0; σ_(r))is additive (e.g., Gaussian) read noise; and the camera control system302 clamps the measurement to simulate saturation.

The camera control system 302 uses the HDR reconstruction algorithm todivide each pixel value by t·K to yield L(x, y), which also lowers thestandard deviation of the noise to σ_(r)/(t·K). This noise should bebelow σ_(w)(x, y) from Equation 4, providing a lower bound on theexposure time:

$\begin{matrix}{{\frac{\sigma_{r}}{K} \cdot \frac{E{\left( {x,y} \right) \cdot {T^{\prime}\left( {{L\left( {x,y} \right)} \cdot {E\left( {x,y} \right)}} \right)}}}{\sigma_{d}\left( {I\left( {x,y} \right)} \right)}} \leq {t.}} & (6)\end{matrix}$

The camera control system 302 also enforces an upper bound to avoidsensor saturation:

$\begin{matrix}{t < {\frac{2^{c} - 1}{K \cdot {L\left( {x,y} \right)}}.}} & (7)\end{matrix}$

For pixels that saturate on the sensor, the estimate L(x, y) must besuch that, when multiplied by E(x, y) and tone mapped, the result shouldsaturate display. This gives an additional constraint

$\begin{matrix}{t \leq {\frac{\left( {2^{c} - 1} \right){E\left( {x,y} \right)}}{K \cdot {T^{- 1}\left( {2^{k} - 1} \right)}}.}} & (8)\end{matrix}$

This analysis can easily be extended to handle nonlinear cameras byfolding the inverse camera response function into T. The camera controlsystem 302 can fold other sources of noise, such as photon noise, intothe read noise by allowing σ_(r) in Equation 6 to vary as a function ofthe pixel value.

The camera control system 302 can optimize the HDR stack. Now that thecamera control system 302 has derived the necessary conditions on eachpixel, the camera control system 302 can combine the requiredconstraints to solve for a set of exposures that best satisfyconstraints. Typically, a scene is handled by no more than 3 exposures,and most cameras offer only a limited number of possible exposurevalues.

FIGS. 6A-6C are diagrams of an example edit-based metering via per-pixelanalysis, according to one embodiment of the present invention. For eachpixel on the screen, the metering device 305 calculates the minimal andmaximal permissible exposure values, accounting for the local and globaltransforms raw sensor values undergo.

FIG. 6A is a diagram illustrating an example case (a) of an edit-basedmetering via per-pixel analysis, according to one embodiment of thepresent invention. For metering, each pixel yields an objective functionJ(x, y, t) based on the minimum and maximum per-pixel exposure valuesB*(x, y) and B*(x, y).

FIG. 6B is a diagram illustrating an example case (b) of an edit-basedmetering via per-pixel analysis, according to one embodiment of thepresent invention.

FIG. 6C is a diagram illustrating an example aggregation of per-pixelobjectives, according to one embodiment of the present invention. Thecamera control system 302 aggregates the per-pixel objectives into asingle objective. This example illustrates an aggregation for aconventional tone mapping operator.

The camera control system 302 implements a greedy approach that seeks toiteratively maximize the aggregate objective function Σ_(x,y)J(x, y, t)with respect to the exposure time t. The objective function shouldpenalize exposure times outside the lower and upper bounds B*(x, y) andB*(x, y) derived at each pixel, using Equations 6-8, the camera controlsystem 302 sets J=0. Otherwise, if the output pixel P(x, y) issaturated, the camera control system 302 favors shorter exposures. Thecamera control system 302 uses the objective function

$\begin{matrix}{{J\left( {x,y,t} \right)} = \left\{ \begin{matrix}{0, {{{if}\mspace{14mu} t} \notin \left\lbrack {{B_{*}\left( {x,y} \right)},{B^{*}\left( {x,y} \right)}} \right\rbrack},} \\{{1 + {{\alpha \left( {x,y} \right)}\log_{2}\frac{t}{B_{*}\left( {x,y} \right)}}},\mspace{14mu} {otherwise},}\end{matrix} \right.} & (9)\end{matrix}$

illustrated in FIGS. 6A and 6B on a logarithmic time axis, with α(x,y)=−0.3 if the pixel is saturated, and 0.0 otherwise.

When t is mapped to logarithmic domain, the objective in Equation (9)becomes a sum of piecewise-linear functions, which the camera controlsystem 302 maximizes in linear time using dynamic programming, bypre-computing and caching the first- and second-order derivatives of theobjective. The camera control system 302 greedily finds exposures thatmaximize the objective, terminating if the camera control system 302reaches the maximum size of the stack or satisfies a certain percentageof per-pixel requirements, whichever occurs first. FIG. 6C illustratesthe objective functions for two different tone mapping operators.

Edit-Based Camera Control Stack Focusing for Display

Another popular target for manipulation is the depth of field. Thecamera control system 302 can use a conventional technique to combine afocal stack (e.g., a set of images focused at different depths) tosimulate extended depth of field. The camera control system 302 can alsoreduce depth of field in a similar manner, or by using a conventionaltechnique to perform image-space blurring. If the user can interactivelyspecify the desired manipulation prior to capture and verifymanipulation via visualization in the viewfinder, the camera controlsystem 302 can use a conventional autofocus routine to deduce theminimal focal stack required for the composition, instead of capturingthe full focal stack, which is expensive.

The camera control system 302 can quantifying a per-pixel focusrequirement. Using the interface described above with reference to FIGS.4A-4E, the user paints a mask F:{(x, y)}→[−1, 1] specifying, forexample, which regions should be sharper or blurrier in a referencephotograph focused at depth z₀ε[z_(min), z_(max)]. The camera controlsystem 302 measures the depths in diopters. Under a simple conventionalthin-lens model, the blur size changes linearly with the offset indiopters. Simultaneously, the viewfinder stream cycles through a numberof focus settings to continuously acquire the scene at various depthsand builds a rough depth map based on a local contrast measure. Usingthe per-pixel mask F, at F=0 the camera control system 302 uses thereference depth z₀. At 1, the camera control system 302 uses themaximally sharp depth z*. At −1, the camera control system 302 uses themaximally blurred depth z_(ext) at the pixel (either z_(min) in orz_(max), whichever would cause the formula to interpolate away from z*);at other values the camera control system 302 linearly interpolates:

{circumflex over (z)}=|F|·z _(t)+(1−|F|)·z ₀,  (10)

where z_(t) is z* if F≧0 and z_(ext) otherwise.

After {circumflex over (z)} is regularized with a cross-bilateral filterusing the scene image, the camera control system 302 obtains thesynthesized scene by sampling from the appropriate slice of the focalstack at each pixel. When the appropriate depth is not available, thecamera control system 302 interpolates linearly from the two nearestslices. The camera control system 302 updates the viewfinder with thesynthetic image continuously.

The camera control system 302 can optimize the focal stack. The map{circumflex over (z)} obtained in the previous section covers acontinuous range of depths, which is impractical to capture. Todiscretize {circumflex over (z)} into a few representative values, thecamera control system 302 reuses the framework described with referenceto Equations 3-9 for optimizing the sum of piecewise-linear functions.The per-pixel objective is 1 at the desired depth {circumflex over(z)}(x, y), linearly reducing to zero at depth error ε (the cameracontrol system 302 uses ε=1.0 for a lens with depth range of 0.0≈10.0diopters):

$\begin{matrix}{{J\left( {x,y,z} \right)} = {{\max\left( {0,\frac{\varepsilon - {{z - {\hat{z}\left( {x,y} \right)}}}}{\varepsilon}} \right)}.}} & (11)\end{matrix}$

The camera control system 302 aggregates this per-pixel objective overall pixels under consideration on the viewfinder. Because Equation 11 ispiecewise linear, the camera control system 302 can optimize theaggregate objective quickly as described with reference to Equations3-9. Once again, the camera control system 302 greedily selects focusdistances that maximize the objective, stopping when the camera controlsystem 302 orders 10 slices or if for most of the pixels {circumflexover (z)} is close to one of the focus distances in the set.

One embodiment of the invention may be implemented as a program productfor use on a computer system, such as the camera system 100 of FIG. 1for example. One or more programs of the program product definefunctions of the embodiments (including the methods described herein)and can be contained on a variety of computer-readable storage media.Illustrative computer-readable storage media include, but are notlimited to: (i) non-writable storage media (e.g., read-only memorydevices within a computer such as CD-ROM disks readable by a CD-ROMdrive, flash memory, ROM chips or any type of solid-state non-volatilesemiconductor memory) on which information is permanently stored; and(ii) writable storage media (e.g., floppy disks within a diskette driveor hard-disk drive or any type of solid-state random-accesssemiconductor memory) on which alterable information is stored.

The invention has been described above with reference to specificembodiments and numerous specific details are set forth to provide amore thorough understanding of the invention. Persons skilled in theart, however, will understand that various modifications and changes maybe made thereto without departing from the broader spirit and scope ofthe invention. The foregoing description and drawings are, accordingly,to be regarded in an illustrative rather than a restrictive sense.

What is claimed is:
 1. A method for a user interface for enablingcontrol of a camera, the method comprising: displaying a tone mappedhigh dynamic range (HDR) image on a user interface device of the camera,wherein the user interface includes a plurality of pixels defining adisplay surface, and wherein the tone mapped HDR image includes aninterpretation of a scene at which a camera lens of the camera ispointing; receiving user edits via an input device associated with theuser interface device; sending the user edits to one or more back-enddevices of the camera to perform processing operations based on the useredits; receiving an updated tone mapped HDR image from the one or moreback-end devices, wherein the updated tone mapped HDR image is generatedfrom the processing operations performed based on the user edits; anddisplaying the updated tone mapped HDR image on the user interface asthe camera lens continues to capture frames of the scene for the one ormore back-end devices to perform operations that iteratively affect theupdated tone mapped HDR image.
 2. The method of claim 1, furthercomprising displaying on the user interface device a selection region inresponse to receiving user input via the input device, wherein theselection region is defined by one or more pixels included in theplurality of pixels.
 3. The method of claim 2, wherein the one or moreback-end devices perform actions comprising: receiving a user edit via auser interface device that displays an interpretation of a scene atwhich a camera lens of the camera is pointing, wherein the user edit isbased on user input that is associated with a selection region on theuser interface device; and generating an edits mask based on one or morematching image patches, which are based on the user edit and a highdynamic range (HDR) image generated by the camera.
 4. The method ofclaim 3, wherein the back-end devices perform further actionscomprising: performing one or more tone mapping operations based on theedits mask and the HDR image in order to generate a tone mapped HDRimage; and performing one or more metering operations based on the editsmask and the tone mapped HDR image in order to calculate meteringparameters for frame capturing operations.
 5. The method of claim 3,wherein the one or more image patches are configured for matching objecttextures by using an image patch that includes a subset of pixelsincluded in the selection region.
 6. The method of claim 2, furthercomprising: displaying on the user interface device a real-time editor,including edit options that allow a user to select a particular editoption; and receiving a user edit via the real-time editor, and, inresponse, initiating one or more back-end processing operationsassociated with the selection region.
 7. The method of claim 6, whereinthe edit options include at least one of brightness, saturation,contrast, hue, white balance, color, tone, focus, exposure, or gain. 8.The method of claim 2, further comprising displaying on the userinterface device a real-time editor, including an edit option forreceiving user input that indicates a magnitude of change to at leastone of brightness, saturation, contrast, hue, white balance, color,tone, focus, exposure, or gain.
 9. The method of claim 2, wherein theone or more back-end devices perform actions comprising: receiving auser edit via a user interface device that displays an interpretation ofa scene at which a camera lens of the camera is pointing, wherein theuser edit is based on user input that is associated with a selectionregion on the user interface device; and adjusting a focus on theselection region relative to one or more other regions of the scene. 10.The method of claim 9, wherein adjusting the focus on the selectionregion comprises quantifying a per-pixel focus requirement for at leastpixels of the selection region.
 11. A computer-readable storage mediumstoring instructions that, when executed by a processor, cause theprocessor executing a program to control of a camera, by performing thesteps of: displaying a tone mapped high dynamic range (HDR) image on auser interface device of the camera, wherein the user interface includesa plurality of pixels defining a display surface, and wherein the tonemapped HDR image includes an interpretation of a scene at which a cameralens of the camera is pointing; receiving user edits via an input deviceassociated with the user interface device; sending the user edits to oneor more back-end devices of the camera to perform processing operationsbased on the user edits; receiving an updated tone mapped HDR image fromthe one or more back-end devices, wherein the updated tone mapped HDRimage is generated from the processing operations performed based on theuser edits; and displaying the updated tone mapped HDR image on the userinterface as the camera lens continues to capture frames of the scenefor the one or more back-end devices to perform operations thatiteratively affect the updated tone mapped HDR image.
 12. Thecomputer-readable storage medium of claim 11, further comprisingdisplaying on the user interface device a selection region in responseto receiving user input via the input device, wherein the selectionregion is defined by one or more pixels included in the plurality ofpixels.
 13. The computer-readable storage medium of claim 12, whereinthe one or more back-end devices perform actions comprising: receiving auser edit via a user interface device that displays an interpretation ofa scene at which a camera lens of the camera is pointing, wherein theuser edit is based on user input that is associated with a selectionregion on the user interface device; and generating an edits mask basedon one or more matching image patches, which are based on the user editand a high dynamic range (HDR) image generated by the camera.
 14. Thecomputer-readable storage medium of claim 13, wherein the back-enddevices perform further actions comprising: performing one or more tonemapping operations based on the edits mask and the HDR image in order togenerate a tone mapped HDR image; and performing one or more meteringoperations based on the edits mask and the tone mapped HDR image inorder to calculate metering parameters for frame capturing operations.15. The computer-readable storage medium of claim 13, wherein the one ormore image patches are configured for matching object textures by usingan image patch that includes a subset of pixels included in theselection region.
 16. The computer-readable storage medium of claim 12,further comprising: displaying on the user interface device a real-timeeditor, including edit options that allow a user to select a particularedit option; and receiving a user edit via the real-time editor, and, inresponse, initiating one or more back-end processing operationsassociated with the selection region.
 17. The computer-readable storagemedium of claim 16, wherein the edit options include at least one ofbrightness, saturation, contrast, hue, white balance, color, tone,focus, exposure, or gain.
 18. The computer-readable storage medium ofclaim 12, wherein the one or more back-end devices perform actionscomprising: receiving a user edit via a user interface device thatdisplays an interpretation of a scene at which a camera lens of thecamera is pointing, wherein the user edit is based on user input that isassociated with a selection region on the user interface device; andadjusting a focus on the selection region relative to one or more otherregions of the scene.
 19. The computer-readable storage medium of claim18, wherein adjusting the focus on the selection region comprisesquantifying a per-pixel focus requirement for at least pixels of theselection region.
 20. A user interface device for enabling control of acamera, the user interface device comprising: a viewfinder deviceconfigured to display a tone mapped high dynamic range (HDR) image on auser interface device of the camera, wherein the user interface includesa plurality of pixels defining a display surface, wherein the tonemapped HDR image includes an interpretation of a scene at which a cameralens of the camera is pointing; and an editor device configured toreceive user edits via an input device associated with the userinterface device, and to send the user edits to one or more back-enddevices of the camera to perform processing operations based on the useredits, wherein the viewfinder device is further configured to receive anupdated tone mapped HDR image from the one or more back-end devices,wherein the updated tone mapped HDR image is generated from theprocessing operations performed based on the user edits, and to displaythe updated tone mapped HDR image on the user interface as the cameralens continues to capture frames of the scene for the one or moreback-end devices to perform operations that iteratively affect theupdated tone mapped HDR image.